\section{Our technique}
\label{sec-our-technique}

\subsection{Function call}

A function call involves a first-class object called a \emph{function
  object} or a \emph{function} for short.  In general, a function may
refer to variables introduced in some outer scope, so that the
function is a \emph{closure}.  The (typically native) instructions to
be executed by the function must be able to refer to such
\emph{closed-over} variables.  But the values of such variables may
vary according to the flow of control at run time.  This situation is
handled by a compile-time procedure called \emph{closure conversion}
whereby a \emph{static environment} is determined for each function
object.  A function object thus consists at least of an \emph{entry
  point}, which is the address of the code to be executed (and which
is shared between all closures with the same code) and an object
representing the static environment (which is specific to each
function object).  A function call must therefore contain instructions
to access the static environment and put it in an agreed-upon place
(typically a register), before control is transferred to the entry
point.

This work covers function calls to functions that are \emph{named} at
the call site.  The most common such case is when the name of the
function appears in the operator position of a compound form.  Less
common cases include arguments of the form \texttt{(function}
\textit{name}\texttt{)} to some standard functions such as
\texttt{funcall} and \texttt{apply}.  In particular, expansions of the
\texttt{setf} macro are often of the form \texttt{(funcall (function
  (setf} \textit{symbol}\texttt{)) ...)}  because function names like
\texttt{(setf }\textit{symbol}\texttt{)} are not allowed in an
operator position.

In general, with such named function calls, the function associated
with the name can be altered at run time, or it can be made undefined
by the use of \texttt{fmakunbound}.  For that reason, the caller can
make no assumptions about the signature of the callee.  This issue is
solved by a standardized \emph{function-call protocol} that dictates
where the caller places the arguments it passes to the callee.

Thus, for the purpose of this work, we define a \emph{function call}
to be the code that accomplishes the following tasks:

\begin{enumerate}
\item It accesses the arguments to be passed to the callee from the
  places they have been stored after computation, and puts the
  arguments in the places where the callee expects them.
\item It accesses the function object associated with the name at the
  call site and stores it in some temporary location.
\item From the function object, it accesses the static environment to
  be passed to the code of the callee.
\item Also from the function object, it accesses the \emph{entry
  point} of the function, i.e., typically the address of the first
  instruction of the code of the callee.
\item It transfers control to the entry point, using an instruction
  that saves the return address for use by the callee to return to the
  caller.
\item Upon callee return, it accesses the return values from the places
  they have been stored by the callee, and puts those values in the
  places where the caller requires them for further computation.
\end{enumerate}

In a typical implementation, a function call is generated when the
code of the caller is compiled, and it then never changes.  As
mentioned above, for this permanent code to work, a particular
\emph{function-call protocol} must be observed, and that protocol must
be independent of the callee, as the callee may change after the
caller has been compiled.

Our technique optimizes function calls to functions in the
\emph{global environment} such that the \emph{name} of the callee is
known statically, i.e., at compile time.  There are three different
types of forms that correspond to this description and that are
considered in this work:

\begin{enumerate}
\item A function form where the operator is a symbol naming a function
  in the global environment, and that does not correspond to any of
  the following two form types.  We use the term \emph{ordinary
    function form} for this case.
\item A function form where the operator is the symbol
  \texttt{funcall} and the first argument is either a literal symbol
  or a \texttt{function} special form with a function name.
\item A function form where the operator is the symbol
  \texttt{apply} and the first argument is either a literal symbol
  or a \texttt{function} special form with a function name.
\end{enumerate}

\noindent
The first type of form can be considered as special syntax for a
\texttt{funcall} form with a constant function name.

There are some other cases that we do not intend to cover, in
particular a call to \texttt{multiple-value-call} with a named
function argument.  In fact, the third type of form could be
generalized to cover other functions that are commonly used with
a constant function argument, such as \texttt{mapcar}.  At the moment,
we are not considering such additional cases.

With our suggested technique, for these three different form types,
the function call is created by the callee.  We discuss each form type
separately.

\subsection{Ordinary function form}

The code emitted by the caller for a function call consists of a
single unconditional \emph{jump} instruction.  The target address in
that instruction is altered by the callee according to its structure.
The code for the function call is contained in an object that we call
a \emph{trampoline snippet}, or just \emph{snippet} for short.  The
callee allocates an appropriate snippet in the global heap as
described in \refSec{sec-sicl-features}, at some available location,
and the unconditional jump instruction of the caller is modified so
that it performs a jump to the first instruction of the snippet.  The
constellation of caller, callee, and snippet is illustrated in
\refFig{fig-snippet}.  We omitted an explicit indication of a control
transfer from the snippet to the callee code, because such a control
transfer is not always required.

\begin{figure}
\begin{center}
\inputfig{fig-snippet.pdf_t}
\end{center}
\caption{\label{fig-snippet}
Caller, callee, and snippet in the global heap.}
\end{figure}

When the callee changes in some way, a new snippet is allocated and
the jump instruction is altered to refer to the position of the new
snippet.  The old snippet is then subject to garbage collection like
any other object.  For the callee to be able to alter the caller this
way, a list of all call sites must be accessible from the name of the
callee.  For an ordinary \commonlisp{} implementation, the symbol used
to name the callee can store such a list.  In \sicl{}, this
information would be kept in the data structure describing the callee
in the first-class global environment
\cite{Strandh:2015:ELS:Environments}.  Either way, to avoid memory
leaks, the call site should be referred to through a weak reference.

When code containing a caller is loaded into the global environment,
and that caller contains a call site that refers to a function that is
not defined at the time the caller is loaded, a \emph{default snippet}
is created.  The default snippet contains the same instructions that a
traditional compiler would create for a call to a function that might
be redefined in the future.  Thus, the default snippet contains code
to put arguments in places dictated by the calling conventions, and
it accesses return values from predefined places.  It also accesses
the function indirectly, either through a symbol object (as most
\commonlisp{} systems probably do) or through a separate
\emph{function cell} as described in our paper on first-class global
environments \cite{Strandh:2015:ELS:Environments}.  The default
snippet is illustrated in \refFig{fig-default-snippet}.  The default
snippet is also used when the definition of the callee changes, as
described below.  A default snippet for each call site can either be
kept around, or allocated as needed.  The former situation is
advantageous for a callee with many call sites and for callees that
are frequently redefined, as it decreases the time to load a new
version of the callee.

\begin{figure}
\begin{center}
\inputfig{fig-default-snippet.pdf_t}
\end{center}
\caption{\label{fig-default-snippet}
Default snippet.}
\end{figure}

In order for the callee to be able to adapt the snippet to its
requirements, the caller, when loaded into the executing image, must
provide information about its call sites to the system.  Each call
site contains information such as:

\begin{itemize}
\item The name of the callee.
\item The number of arguments.
\item The type of each argument.  If the type is not known, it is
  indicated as \texttt{t}.  When an argument is a literal object, its
  type is indicated as \texttt{(eql ...)}.
\item For each argument, whether the argument is boxed or unboxed.
\item For each argument, its location.  The location can be a register
  or a stack position in the form of an offset from a frame pointer.
\item The number of required return values, or an indication that all
  return values are required, no matter the number.
\item In case of a fixed number of return values, for each such value,
  some limited information of the \emph{type} of each value.  See
  below for a more elaborate explanation of the restrictions involved
  for this information.
\item Also, in case of a fixed number of return values, for each such
  value, the location where the caller expects the value.
\item Indication as to whether the call is a \emph{tail call}, in
  which case the snippet should deallocate the frame before
  returning.
\end{itemize}

A callee can take advantage of this information to customize the
call.  The default action is to generate a snippet that implements the
full function-call protocol, without taking into account information
about the types of the arguments.

While our technique allows for information provided by the caller to
be taken into account by the callee in various ways, the opposite
direction is not generally possible.  The reason is that the callee
can change or be redefined in arbitrary ways, and the caller code is
fixed, so it can not adapt to such changes in the callee.  The
only place where some limited amount of adaptation is possible is in
the snippet, after the callee code returns.

A direct consequence to this one-directional dependency is that the
caller can not, in general, dictate the type of the return values.
The current callee will produce the values that its code dictates, no
matter what the caller needs.  However, it is quite advantageous to be
able to return unboxed values of certain types; in particular
full-word floating point numbers.  For that reason, we allow some
restricted type information to be provided by the caller.
Thus, if the caller indicates a type other than \texttt{t} for some
return value, it has to be one of a small number of fixed types, for
example \texttt{double-float}, \texttt{character},
\texttt{(signed-byte 64)} and \texttt{(unsigned-byte 64)} (assuming a
64-bit architecture).  When one of these types is indicated by the
caller, the meaning is that the caller requires an unboxed value of
this type.  Then, if the callee cannot supply such a value, code is
generated in the snippet to signal an error.

When a modification is made to a callee that alters its semantics,
care must be taken so as to respect the overall semantics of all
callers.  In particular, a callee can be removed using
\texttt{fmakunbound} or entirely replaced using \texttt{(setf
  fdefinition)}.  In that case, the following steps are taken in
order:

\begin{enumerate}
\item First, every call site is de-optimized, which means that a
  default snippet is allocated for each caller, or the kept default
  snippet is reused.  The unconditional jump instruction is modified
  to refer to the default snippet.  As previously explained, this
  snippet contains code for the full function-call protocol, and in
  particular, it accesses the callee using an indirection through the
  function cell.
\item Next, the callee is atomically replaced by a new function, or
  entirely removed by a single modification to the contents of the
  function cell.
\item The new function is attached to the list of call sites, and,
  depending on the nature of the new function, new snippets can then
  be allocated in order to improve performance of calls to the new
  function.
\end{enumerate}

\noindent
When new snippets are substituted, actions may be needed to ensure
that that processors do not use stale code.  Depending on the types of
processors involved, such actions include flushing instruction caches
and prefetch pipelines.

The thread responsible for redefining the callee, blocks until step 1
is accomplished.  Without this blocking, some callers may get the old
version of the callee and some others the new version, thereby
violating the overall semantics of a function redefinition.  In some
cases, it may be acceptable for different callers to get different
versions, but in the general case, i.e., when it is observable which
versions are used, it is not acceptable.  Because of this requirement,
redefining a function can be an expensive operation, but redefining a
function is expected to be infrequent compared to calling it.

Step 3, on the other hand can be accomplished asynchronously, and even
in parallel with caller threads, provided that appropriate
synchronization prevents subsequent simultaneous redefinitions of the
callee.

\subsection{\texttt{funcall} with known function name}

There are two subcases for this type of form:

\begin{enumerate}
\item The first argument is a special form \texttt{quote} with the
  argument being a symbol.  This case can occur as a result of the
  programmer wanting to avoid capture of the function name, and make
  sure the name refers to the function with that name in the global
  environment.  It can also occur in the expansion of a macro form.
\item The first argument is a special form \texttt{function}.  This
  case typically occurs as a result of expanding a \texttt{setf} macro
  form and the function name is then of the form \texttt{(setf
  }\textit{symbol}\texttt{)}.  The expansion uses \texttt{funcall}
  simply because a function name of this form can not be used as the
  operator of a function form.  This case can also occur in the
  expansion of a macro form.
\end{enumerate}

To handle this case, the compiler treats the symbol \texttt{funcall}
as a special operator.  If the first argument corresponds to any of
the two subcases, then the call is treated in the same way as an
ordinary function form.  Otherwise it generates a call to the function
\texttt{funcall}.

\subsection{\texttt{apply} with known function name}

As with \texttt{funcall}, the same two subcases exist for
\texttt{apply}, and for the same reason.  Again, the compiler treats
the symbol \texttt{apply} as a special operator and generates a call
to the function \texttt{apply} whenever the first argument is neither
the special form \texttt{quote} nor the special form
\texttt{function}.

However, the case of \texttt{apply} is of course more complex than
that of \texttt{funcall}.  Recall that \texttt{apply} takes at least
two arguments.  The first argument is a function designator as with
\texttt{funcall}.  The remaining arguments represent a
\emph{spreadable argument list designator}, which means that the last
argument is treated as a list of objects, and the arguments to the
callee are the objects in that list, preceded by the remaining
arguments to \texttt{apply}, in the order that they appear.

A very common subcase of this case is a call to \texttt{apply} with
exactly two arguments.  It is used when the execution of some code
results in a list of objects, and these objects must be passed as the
arguments to some function, in the order that they appear in the
list.  For this subcase, our technique can be used to avoid the
indirection to find the callee entry point as usual.  But it can also
be used to access the callee arguments directly from the list of
objects, so as to avoid unpacking the list to locations dictated by
the full call protocol.

A more interesting subcase is that of some intermediate function
wanting to override some, but not all of the keyword arguments that it
was passed, before calling the callee.  The remaining arguments to
\texttt{apply} are then typically keyword/value pairs.  Our technique
can then be used to avoid scanning the last argument to \texttt{apply}
for these keyword arguments.  Recall that the standard allows for
multiple occurrences of the same keyword argument in an argument list,
and that the first occurrence is then the one that is used.

Call-site information resulting from a call to \texttt{apply} must be
indicated as such, so that the call-site manager can process the
arguments as a spreadable argument list designator, rather than as an
ordinary suite of arguments.

%%  LocalWords:  callee redefinitions
